What is the basis of selecting data in KNN (K-Nearest-Neighbor) weather generator?

the k-nearest neighbor (k-NN) weather generator, originally developed by Sharif and Burn (2007). The k-nearest neighbour, simplest machine learning algorithm, finds 'k' number of neighbours from the training set which are near to the query data, based on its distance metric. The KNN algorithm is described in detail in Buishand and Brandsma (2001), Gangopadhyay et al. (2005), Lall and Sharma (1996). The k-NN (K-nearest neighbor) technique is based on selecting a specified number of days similar in characteristics to the day of interest from the historical record. The vector of input data consists of p variables across q stations for each day of the historical record.

For reviewing the major steps of the k-nearest neighbor (k-NN) weather generator you can refer to the Prediction of climate variables by comparing the k-nearest neighbor method and MIROC5 outputs in an arid environment, which is a complete paper with a complete tool with name of KNN (K-Nearest-Neighbor) weather generator. The KNN (K-Nearest-Neighbor) weather generator tool has different oprions, such as selecting every different numbers of weather variables. It is so essential and you can see the feedback of this method with different numbers of input data.

Gangopadhyay et al.(2005) used the KNN model for downscaling local-scale temperature and precipitation in the United States. The k-nearest neighbor (k-NN) algorithm searches for analogues of a feature vector based on similarity criteria in the observed time series. Perhaps the most easy and candid approach in machine learning is K- Nearest Neighbour but when attributes increases certainty in the prediction decreases and also in context with narrow sample space it experiences high variance.

The k-NN (K-nearest neighbor) technique is based on selecting a specified number of days similar in characteristics to the day of interest from the historical record. The vector of input data consists of p variables across q stations for each day of the historical record. The major advantage of k-NN (K-nearest neighbor) resampling approaches is that they do not require restrictive assumptions concerning the joint distribution of the different predictands. Therefore, they can be easily applied to the generation of nonnormally distributed data. As surface weather variables are sampled simultaneously from historical records for a given analog day, generated fields are physically realistic and consistent (because already observed) within each day.

The fundamental idea of the k-NN (K-nearest neighbor) algorithm is to search for analogs of a feature vector (vector of variables for which analogs are sought) based on similarity criteria in the observed time series. In the weather generator model, the day immediately following the analog day is taken as the next day in the generated sequence, and the process is repeated.





Name: Hidden